Enqueued related words: TD Error, Eligibility Trace

Temporal-Difference Learning

释义 Definition

时序差分学习（TD 学习）：强化学习中的一种方法，通过比较相邻时间步的预测差（“时间上的差分误差”）来更新价值估计；它把“基于采样的学习”（不必等到回合结束）与“动态规划式的自举更新”（用当前估计去更新当前估计）结合起来。常见形式包括 TD(0)、TD(λ)；Q-learning、SARSA 等也与 TD 思想密切相关。

发音 Pronunciation (IPA)

/ˈtɛmpərəl ˈdɪfərəns ˈlɜːrnɪŋ/

例句 Examples

I used temporal-difference learning to estimate the value of each state.
我用时序差分学习来估计每个状态的价值。

Temporal-difference learning updates predictions online by minimizing the error between consecutive estimates, which makes it effective in long tasks with delayed rewards.
时序差分学习通过最小化相邻估计之间的误差来进行在线更新，因此在奖励延迟、任务很长的场景中很有效。

词源 Etymology

temporal 意为“时间的、时序的”，difference 指“差分/差值”，合起来强调“跨时间步的差”。这一术语在强化学习研究中被系统化使用，尤其与 Richard S. Sutton 等人的工作相关：其核心思想是用下一时刻的预测（或回报与预测的组合）来更新当前预测，即所谓的自举（bootstrapping）。

文学与经典著作中的用例 Literary Works